A Data-Driven Dependency Parser for Urdu
نویسندگان
چکیده
One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first data-driven dependency parser for Urdu, which has developed using MaltParser system and trained and evaluated on data from Urdu dependency treebank. A 40000 words corpus is manually tagged to build dependency treebank for Urdu. Four tagging schemes are used for manually tagging, phrases tagset, token position tag, head of each token tag and dependency relation tagset on POS tagged data. The parser achieves a best labeled attachment score of 74.48%, unlabeled attachment score of 90.14% and label attachment of 76.38%. We present a partial error analysis, focusing on accuracy for different chunk and dependencies.
منابع مشابه
Urdu Dependency Parser: A Data-Driven approach
In this paper, we present what we believe to be the first data-driven dependency parser for Urdu. The parser was trained and tuned using MaltParser system, a system for data-driven dependency parsing. The Urdu dependency treebank (UDT) is used for training and testing of the Urdu dependency parser, is also presented first time. The UDT contains corpus of 2853 sentences which are annotated at mu...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملExploiting Language Variants Via Grammar Parsing Having Morphologically Rich Information
In this paper, the development and evaluation of the Urdu parser is presented along with the comparison of existing resources for the language variants Urdu/Hindi. This parser was given a linguistically rich grammar extracted from a treebank. This context free grammar with sufficient encoded information is comparable with the state of the art parsing requirements for morphologically rich and cl...
متن کاملBuilding Computational Resources: The URDU.KON-TB Treebank and the Urdu Parser
This work presents the development of the URDU.KON-TB treebank, its annotation evaluation & guidelines and the construction of the Urdu parser for a South Asian language Urdu. Urdu is comparatively an under-resourced language and the development of a reliable treebank and a parser will have significant impact on the state-of-the-art for automatic Urdu language processing. The work includes the ...
متن کاملImproving data-driven dependency parsing using large-scale LFG grammars
This paper presents experiments which combine a grammar-driven and a datadriven parser. We show how the conversion of LFG output to dependency representation allows for a technique of parser stacking, whereby the output of the grammar-driven parser supplies features for a data-driven dependency parser. We evaluate on English and German and show significant improvements stemming from the propose...
متن کامل